Speech Recognition Front End Without Information Loss

نویسندگان

  • Matthew Ager
  • Zoran Cvetkovic
  • Peter Sollich
چکیده

Phoneme classification is investigated for linear feature domains with the aim of improving robustness to additive noise. In linear feature domains noise adaptation is exact, potentially leading to more accurate classification than representations involving non-linear processing and dimensionality reduction. A generative framework is developed for isolated phoneme classification using linear features. Initial results are shown for representations consisting of concatenated frames from the centre of the phoneme, each containing f frames. As phonemes have variable duration, no single f is optimal for all phonemes, therefore an average is taken over models with a range of values of f . Results are further improved by including information from the entire phoneme and transitions. In the presence of additive noise, classification in this framework performs better than an analogous PLP classifier, adapted to noise using cepstral mean and variance normalisation, below 18dB SNR. Finally we propose classification using a combination of acoustic waveform and PLP log-likelihoods. The combined classifier performs uniformly better than either of the individual classifiers across all noise levels.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SNR-based mask compensation for computational auditory scene analysis applied to speech recognition in a car environment

In this paper, we propose a computational auditory scene analysis (CASA)–based front–end for two–microphone speech recognition in a car environment. One of the important issues associated with CASA is the accurate estimation of mask information for target speech separation within multiple microphone noisy speech. For such a task, the time–frequency mask information is compensated through the si...

متن کامل

Noise robust hands-free speech recognition using microphone array and Kalman filter as front-end system of conversational TV

In this paper, we investigate hands-free speech recognition as front-end system of conversational TV. The conversational TV is one of machine conversation systems to retrieve the interesting information by inquiring it to the TV. To realize the natural machine conversation without consciousness of microphone, hands-free speech recognition is required. In the handsfree speech recognition system,...

متن کامل

Performance improvement of a bitstream-based front-end for wireless speech recognition in adverse environments

In this paper, we propose a feature enhancement algorithm for wireless speech recognition in adverse acoustic environments. A speech recognition system is realized at the network side of a wireless communications system and feature parameters are extracted directly from the bitstream of the speech coder employed in the system, where the feature parameters are composed of spectral envelope infor...

متن کامل

A bitstream-based front-end for wireless speech recognition on IS-136 communications system

In this paper, we propose a feature extraction method for a speech recognizer that operates in digital communication networks. The feature parameters are basically extracted by converting the quantized spectral information of a speech coder into a cepstrum. We also include the voiced/unvoiced information obtained from the bitstream of the speech coder in the recognition feature set. We performe...

متن کامل

Recognizing voice over IP: a robust front-end for speech recognition on the world wide web

The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequentl...

متن کامل

Spectral Features for Automatic Text-Independent Speaker Recognition

Front-end or feature extractor is the first component in an automatic speaker recognition system. Feature extraction transforms the raw speech signal into a compact but effective representation that is more stable and discriminative than the original signal. Since the front-end is the first component in the chain, the quality of the later components (speaker modeling and pattern matching) is st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013